Matrix-vector multiplication: Parallel algorithms and architectures
نویسندگان
چکیده
منابع مشابه
Parallel Graph Algorithms with In-database Matrix-Vector Multiplication
Graph problems are significantly harder to solve with large graphs residing on disk compared to main memory only. In this work, we study how to solve four important graph problems: reachability from a source vertex, single source shortest path, weakly connected components, and PageRank. It is well known that the aforementioned algorithms can be expressed as an iteration of matrix-vector multipl...
متن کاملFast Matrix Multiplication Algorithms on Mimd Architectures
Sequential fast matrix multiplication algorithms of Strassen and Winograd are studied; the complexity bound given by Strassen is improved. These algorithms are parallelized on MIMD distributed memory architectures of ring and torus topologies; a generalization to a hyper-torus is also given. Complexity and efficiency are analyzed and good asymptotic behaviour is proved. These new parallel algor...
متن کاملAutomatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures
Graphics processors are increasingly used in scientific applications due to their high computational power, which comes from hardware with multiple-level parallelism and memory hierarchy. Sparse matrix computations frequently arise in scientific applications, for example, when solving PDEs on unstructured grids. However, traditional sparse matrix algorithms are difficult to efficiently parallel...
متن کاملSIMD Parallel Sparse Matrix-Vector and Transposed-Matrix-Vector Multiplication in DD Precision
We accelerate a double precision sparse matrix and DD vector multiplication (DD-SpMV), and its transposition and DD vector multiplication (DD-TSpMV) by using SIMD AVX2 for Krylov subspace methods. We compare some storage formats of DD-SpMV and DDTSpMV for AVX2 to eliminate performance degradation factors in CRS. Our experience indicates that BCRS4x1, with fitting block size to the SIMD register...
متن کاملCommunication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms
Extra memory allows parallel matrix multiplication to be done with asymptotically less communication than Cannon’s algorithm and be faster in practice. “3D” algorithms arrange the p processors in a 3D array, and store redundant copies of the matrices on each of p layers. ‘2D” algorithms such as Cannon’s algorithm store a single copy of the matrices on a 2D array of processors. We generalize the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computers & Mathematics with Applications
سال: 1988
ISSN: 0898-1221
DOI: 10.1016/0898-1221(88)90262-3